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Turn the set of permutations of n objects into a graph Gn by 
connecting two permutations that differ by one transposition, and 
let at be the simple random walk on this graph. In a previous paper, 
Berestycki and Durrett [In Discrete Random Walks (2005) 17-26] 
showed that the limiting behavior of the distance from the identity 
at time cn/2 has a phase transition at c = 1. Here we investigate some 
consequences of this result for the geometry of G„. Our first result 
can be interpreted as a breakdown for the Gromov hyperbolicity of 
the graph as seen by the random walk, which occurs at a critical 
radius equal to n/4. Let T be a triangle formed by the origin and 
two points sampled independently from the hitting distribution on 
the sphere of radius an for a constant < a < 1. Then when a < 1/4, 
if the geodesies are suitably chosen, with high probability T is (5-thin 
for some S > 0, whereas it is always 0(n)-thick when a > 1/4. We 
also show that the hitting distribution of the sphere of radius an 
is asymptotically singular with respect to the uniform distribution. 
Finally, we prove that the critical behavior of this Gromov-like hy- 
perbolicity constant persists if the two endpoints are sampled from 
the uniform measure on the sphere of radius an. However, in this 
case, the critical radius is a = 1 — log 2. 

1. Introduction. Let Sn be the set of permutations of {1,2, . . . ,n}, and 
let at be the continuous-time random walk on Sn that results when randomly 
chosen transpositions are performed at rate 1. Let d(at) be the distance 
from the identity / at time t, that is, the minimum number of transpositions 
needed to return to /. In a previous paper, Berestycki and Durrett [3] showed 
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Theorem 0. As oo, d{acn/2) /fT- ^pu{c) where 

oo . ,k-2 

(1.1) ^(^) = i-E-t(^0'- 

fc=l 

Although it is not easy to see from the formula, the function u{c) = c/2 for 
c < 1 and is < c/2 for c > 1. 

We can think of at as a random walk on the graph G„ with vertices Sn 
and edges connecting two permutations that differ by one transposition, so 
that Gn is the Cayley graph of 5„ associated with the set of generators 
5* = {ah transpositions}. Theorem was proved by estabhshing a connec- 
tion with Erdos-Renyi random graphs. The phase transition observed for at 
is then related to the well-known double jump of the size of connected com- 
ponents of G{n,c/n) at c = 1. [Here and in all that follows, G{n,p) denotes 
the Erdos-Renyi random graph with parameters n and p, i.e., a random 
graph on n vertices where each edge is present independently of the others 
with probability p.] We refer the reader to Janson et al. [8] for this and other 
facts about Erdos-Renyi random graphs. 

In this paper we try to investigate some of the geometric implications of 
Theorem 0. We find a new connection between the speed of a random walk 
and the Gromov hyperbolicity of the space in which the random walk is 
evolving. 

Organization of the paper. In Sections 1.1, 1.2, 1.3 we present our re- 
sults. The proofs of these results can be found successively in Sections 2-8. 
Each proof is preceded by a restatement of the corresponding theorem for 
convenience, and by an informal proof which outlines the main ideas used. 

1.1. Asymptotic hyperbolicity. The notion of hyperbolicity for a discrete 
structure such as a group is a notion that goes back to Gromov [7] . As there 
is no derivative, and thus no curvature available in a discrete space, the idea 
is to define what hyperbolic means using only elementary properties of the 
space. 

One way to do this is as follows. Let {X,\ ■ \) be a metric space, where 
\x — y\ denotes the distance between x and y. For points x,y and p in X, 
define the Gromov inner product by 

2{x\y)p = \x - p\ + \y - p\ - \x - y\. 

{x\y)p thus measures how well the union of the geodesic segments [p, x] U [p, y] 
approximates a geodesic between x and y. Gromov's original definition of 
hyperbolic spaces is as follows. Call X 5- hyperbolic if 



(1.2) 



{x\z)p>{x\y)pA{y\z)p-6 
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for all x,y,z and p. This definition is not very intuitive at first, but fortu- 
nately there is an equivalent definition, which can be formulated using the 
notion of (5-thin triangle. A triangle {x,y,z) with geodesic sides si, 82,83 is 
said to be (5-thin if any side, say si, lies entirely within distance at most 6 
of the two remaining sides: 

si C {x G X, d{x, S2 U S3) < 6}. 

The space is called (5-hyperbolic if all geodesic triangles are (5-thin, and it 
is simply called hyperbolic if it is (5-hyperbolic for some 6 >0 (when 6 = 0, 
the space isometrically embeds into a tree). It is not immediate, but not 
hard to check, that if all triangles {x,y,z) are 5-thin, then (1.2) is satisfied 
for some number 5' that may differ by a constant factor from 5. Conversely, 
in a space where (1.2) is satisfied for all points {p,x,y,z), all triangles are 
5'-thin, where 5' may differ from (5 by a constant factor. 

Of course a bounded space (in particular, a finite space such as Sn) is triv- 
ially hyperbolic, but we will be interested in situations where the constant 
6 may or may not stay bounded as the size of the space tends to 00. 

Our first result makes the connection between Theorem and Gromov 
hyperbolic spaces, where we look at the two definitions of hyperbolic con- 
stants suitably weakened. For < a < 1, let dB{an) be the sphere of radius 
an, that is, the set of points at distance [anj from the origin. We let u be 
the hitting distribution of dB{an) by at, that is, v is the law on dB{an) of 
ut where T = inf{t > 0,d{at) = [anJ}. 

Theorem 1. Let x,y be sampled from v independently, and set p = I , 
the identity element. 

1. If a < 1/4, then there is some 6 < 00 (depending only on a), such that 

Eix\y)p<6. 

Moreover, with probability asymptotically 1, there is a geodesic between 
x and y that comes within expected distance 6' <oo of p. 

2. Ifa> 1/4, then 

E{x\y)p ~ 6n 

for some < 6 < 00. Moreover, no geodesic between x and y can approach 
p closer than 5'n with probability asymptotically 1, where < S' < 00. 

In the statement of the theorem and in the rest of the paper, a„ ~ bn 
means that an/bn — > 1- 

Remark. It follows immediately from Theorem 1 that when a < 1/4, 
with probability asymptotically 1, 

{x\z)p > {x\y)p A {y\z)p - 6 



4 



N. BERESTYCKI 



for independent x,y,z sampled from u, hence the idea that definition 1 of 
hyperbohcity is satisfied "asymptotically z/-almost surely." The statement 
about the geodesies shows that definition 2 is satisfied "asymptotically v- 
almost surely" when a < 1/4. 

At this point we should emphasize that the result in Theorem 1 involves 
hyperbolic constants that are different from the standard definitions dis- 
cussed above in several important ways. The most obvious difference comes 
from the randomness of x and y, and from the fact that the roles played by 
x,y and p are somewhat different. Here p is a fixed reference point, whereas 
Gromov's definition requires that every triangle should be thin. Another 
issue is that, corresponding to the second definition of hyperbohcity with 
thin triangles, we show that there exists a certain geodesic between x and 
y having the desired properties. As we will see below in Theorem 6, there 
may be a great many geodesies between two given points in Sn- More impor- 
tantly, these geodesies can be far apart, as will show the following concrete 
example: 

a (1 14 5 11) (2) (3 9) (4 13 6) (7 12 8 ) (10) 
7T, (1) (14) (5) (11) (2) (3) (9) (4 13 6) (7 12 8 ) (10) 
^2 (1 14 5 11) (2) (3 9) (4) (13) (6) (7) (12) (8) (10) 
TTiTT^^ (11 5 14 1) (2) (9 3) (4 13 6) (7 12 8) (10) 

Since for any permutation vr we have d^n) = n — ^ cycles of vr, d[a) = 8. 
TTi and 7r2 are on two geodesies from / to a, but d{TTi,'K2) = ^(vrivrg'"'^) = 8. 
In general if d{a) = cn/2 with c < 1, and we divide the cycles at random 
into two groups, we can define vri to have cycle structure given by the first 
group of a staying as it is and the second completely broken in cycles on 
lengths 1. If we define H2 by the exchange of the two groups, then we will 
have d[a, VTj) = cn/A and d[TTi,TT2) = cn/2. 

1.2. The geometry of Gn- How much can we learn from Theorem 1 about 
the global geometry of Gn? To answer this question, we need to see how 
special a choice it is to sample the points x and y according to the hitting 
distribution v. (The fact that p = / is a fixed reference point is not too im- 
portant, due to the transitivity of G„.) We begin by an apparently unrelated 
question, which is to ask how large is a ball of radius an. 

Theorem 2. //O < a < 1, then as n^ oo, we have \B{I,an)\ ^ {n\y in 
a logarithmic sense, that is, 

log\B{I,an)\ 
lim = a. 

n^oo n log n 



GEOMETRY OF RANDOM TRANSPOSITIONS 



5 



This result is probably not new, but we have not found it in the literature. 
Our original motivation for studying the volume growth in Gn was to try 
to understand the phase transition of Theorem in terms of the geometry 
of Gn- Our first thought was that since the speed was nonsmooth we might 
see a change in the volume growth. The above result contradicts this idea. 

To put our next two results into perspective it is useful to contrast them 
with Brownian motion Bf on a d-dimensional manifold of constant negative 
curvature —1. In that case as t — > oo, if d{Bt) is the distance from the origin, 
then (see [11], e.g.) there is a constant v so that 

d{Bt)/t^v as t — > oo. 

In the case of Brownian motion on hyperbolic space, rotational symmetry 
implies that the hitting distribution is uniform. In contrast for the random 
transposition random walk, we will see in Theorem 3 that the hitting dis- 
tribution is asymptotically singular with respect to the uniform distribution 
on dB(I, an). 

Theorem 3. Let \Ci\ be the length of the cycle that contains 1. Under 
fi, the uniform distribution on dB{I,an), 

\Ci\^G 

where G is a geometric r.v. with P{G > k) = (6/(1 + b))^ and b satisfies 
log(l + 6)/6 = l-a. 

To describe the hitting distribution we note that (1.1) suggests that it 
will be the same as the distribution of <Tcn/2 where c = u~^{a). When a > 1/2 
this is much different from the distribution in Theorem 3 since in this case 
c > 1 and Schramm [12] has shown that cJc„/2 cycles of lengths of order 
n. 

Here we will concentrate on what happens when a < 1/2 and c = 2a. In 
this case results in [3] show that as n ^ oo, the number of fragmentations 
before time cn/2 is asymptotically a Poisson random variable with mean 
k{c) = — (log(l — c) + c)/2. In particular, 

P{d{^cn/2) = cn/2) ^ e-<^^ = e^/^Vl^c. 

It will be convenient to approach the hitting distribution by the distri- 
bution i/Q of <Tc„/2 conditioned on no fragmentation. More generally, if z^^ = 
conditioned on exactly k fragmentations before the hitting time, 

. = e-(^)f:..^ + o(l). 

To study I'o ) we recall the connection with random graphs developed in [3] : 
when we transpose i and j we draw an edge between i and j. In order for the 
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distance from the identity to increase by 1 at each time, each transposition 
must involve indices from two different cycles and will merge them into one. 
In terms of the random graph, this means that all components are trees. 
Using results from [3], it is straightforward to show: 

Theorem 4. Let Ci be the length of the cycle that contains 1. Let c < 1. 
Under vq, 

P{\Ci\ = k)^^'^{ce-'')'' for all k> I. 

Theorems 3 and 4 show that the uniform distribution ^ and the hitting 
distribution vq concentrate on different permutations. In the first case the 
number of fixed points will be close to its expected value nP(|Ci| = 1) = 
n/ (1 + 6). In the second it will be close to ne~^ by Theorem 4. This is made 
precise by the following theorem. 

Theorem 5. As n oo, the hitting distribution v and the uniform dis- 
tribution II on a sphere of radius an are asymptotically singular: 

dTvifJ-,!^) 1. 

Let t = [cn/2] with c < 1. To understand why is different from ^ we 
will examine the Radon-Nikodym derivative r{a) = dvo/dfi. It is not hard 
to show that 

Theorem 6. Suppose d{a) =t and mi, . . . ,mj are the cycle lengths of 
a . The number of paths of length t from I to a is 

mi— 2 



If t= [cn/2\ with c<l, then 



ria 



J mi— 2 



^-L(mi-l)!' 

where K^^t is a constant that only depends on n and t. 

The last result enables us to prove a stronger version of Theorem 5: it 
tells us that the "support" of i' is concentrated on a set that is exponentially 
smaller than the size of dB{an). 

Theorem 7. Suppose a < 1/2. There exists a set Sn G dB{an) such that 
^{Sn) I as CO and 

Mm - log = 7 < 0. 

n^oo n \oB[an)\ 
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1.3. The hyperbolic constant under the uniform measure. In Theorem 1, 
we learn that if x and y are sampled from i/, roughly speaking, the Gro- 
mov hyperbolicity of the "support" breaks down at a = 1/4, that is, the 
hyperbolic constant increases suddenly from 0(1) to 0{n) at this point. 

However, the results from the previous section tell us that this "support" 
is (exponentially) small with respect to the ambient space. It is therefore 
natural to ask what happens to Theorem 1 when we replace v with the 
uniform measure /i on dB{an). Theorem 8 will show that the qualitative 
behavior of the hyperbolic constant remains the same. We prove that there 
is a threshold where the expected Gromov inner product E{(t\'k)p jumps 
from 0(1) to 0{n), but this time the critical value is a = 1 — log2 « 0.31, 
rather than a = 1/4. 

When a and tt are independent uniform permutations on dB{an), by 
the transitivity of Gn, it is enough to analyze d{a,iT) to understand {a\7r)p, 
the inner Gromov product. Since d{a,TT) = d{I,a~^7r), which has the same 
law as (i(/,(T7r), it will be enough to characterize the values of a for which 
d{I,aTT) = 2an + o{n) and those for which it is < 2an. 

Theorem 8. Let < a < 1 and let a, vr he two random independent 
points chosen uniformly from dB{an). Then: 

1. //a<l-log2, 

E{a\Ti)p < 5{\ognf 

for some < (5 = 5{a) < cxo. Moreover, with probability asymptotically 1, 
there is a geodesic between a and vr that comes within distance at most 
6{logn)^ of p. 

2. //a> l-log2, 

E{a\Tr)p ~ 6n 

for some 6 = 6{a) > 0. Moreover, no geodesic can approach p closer than 
5'n for some < c^' < oo . 

Remark. The 0((logn)^) bound in part 1 of the theorem could prob- 
ably be improved into an 0(1) bound (just like in Theorem 1) with some 
more work, but we have not tried to do so. In part 2, by analogy with 
Berestycki and Durrett [3], we conjecture that the fluctuations are of order 
exactly n^/^ in the supercritical regime. More precisely, it should be true 
that when a > 1 — log 2, 

n-^/^{E{a\7r)p-6n) ^AA(0,k), 

where S is the limit in part 2 of the theorem, and k is some parameter. 
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2. Asymptotic hyperbolicity under v. The first result we prove is The- 
orem 1. 

Theorem 1. Let x,y be sampled from v independently, and set p = I , 
the identity element. 

1. If a < 1/4, then there is some 5 < oo (depending only on a), such that 

E{x\y)p < 6. 

Moreover, with probability asymptotically 1, there is a geodesic between 
X and y that comes within expected distance 6' < oo of p. 

2. Ifa> 1/4, then 

E{x\y)pr^5n 

for some < 5 < cxd . Moreover, no geodesic between x and y can approach 
p closer than 5'n with probability asymptotically 1, where < S' < oo. 



Sketch of the proof. Let Xt and be two independent random walks 
starting at the origin. Let them run until the times T and T' where they 
respectively hit the sphere dB{an). Then the transitivity of the Cayley graph 
of 5n, and the reversibility of the increments of the random walk, imply 
that (Xy, Xt~i, . . . ,p,Yi, . . . , Yj") is a random walk path of length T + T' . 
Hence the distance between Xj- = x and Yj" = y is the same as d{aT+T')- 
By Theorem 0, T and T' ~ ^u~^{a)n, so applying Theorem again, when 
a < 1/4, |x — ?/| ~ 2an = \x\ + \y\ [the random walk runs for a time 2an < n/2 
and there are only 0(1) fragmentations]. For a > 1/4, the random walk is 
run for time u~^{a)n which, in view of Theorem 0, means that c = 2u~^{a), 
and \x — y\= nu{2u~^(a)) <C 2an. See Figure 1. 




Fig. 1. Two independent random walks run until they hit the sphere of radius an. 
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The claim about the existence of a geodesic that makes the triangle 
{x,p,y) thin involves necessarily another argument, since geodesies may be 
far apart. However, it is not very hard to construct by hand a geodesic be- 
tween the identity and x such that each point of the random walk path is 
within 0(1) of this geodesic. Applying this construction to the two random 
walk paths gives the result of Theorem 1. 

Proof of Theorem 1. Let us first deal with case a < 1/4 and prove 
that in this case E(x\y)p < 6. Keeping the same notation as above, note 
that an + 0{l) steps are sufficient for X to reach distance an. Indeed, after 
an steps, Xan is at distance an — Xi where Xi is twice a Poisson random 
variable by Theorem 1 in [3]. It is immediate that in Xi steps the proba- 
bility that X has a fragmentation converges to 0. Therefore T — an ^ Xi 
(remember that here time is measured discretely). Similarly T' — an ^ X2 
where Xi,X2 are i.i.d. Hence {x = Xt,XT-i, ■ ■ ■ ,Xi,I ,Yi, . . . .,Yt = y) is a 
random walk of 2an + Xi + X2 steps. In the worst case possible all Xi + X2 
steps represent "backward" steps (meaning, toward x rather than y). Hence 
if X3 = an — d{I, (T2an) (so that X3 is also twice a Poisson random variable, 
but with a different parameter), 

2E{x\y)p = 2an — \x — y\ 

< 2an - Ed{I, a2an) + E{Xx + X2) 

< E{Xi) + E{X2) + ^(Xa) < 00. 

It is slightly simpler to prove that when a > 1/4, E{x\y)p ~ 5n. Indeed, in 
this case, by Theorem 0, we have that 

\u-^{a) -e< T/n < ^u~^{a) + e. 

Therefore 

inf d{I,at) <\x — y\ < sup d{I,at). 

|t/n-u-i(a)|<2£ |t/n-M-i(a)|<2e 

An easy estimate shows that we are never off by more than 0(n^/^) if we 
evaluate the distance of the random walk by counting the number of clusters 
of the random graph rather than the number of cycles of at- But for the 
random graph, the number of clusters is monotone increasing. Hence, if a 
denotes u{2u~^ (a)) , we have by continuity of u that 

a-e' + 0(1) < <a + e' + o(l) 

n 

and e' can be made as small as desired by continuity of u. Therefore 

E\x — y\ 

> a. 

n 
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It suffices now to prove that a < 2a, that is, n(2n~^(a)) < 2a or, after change 
of variable c = u~^{a), it suffices to prove u{2c) < 2u{c) for all c> 1/2. This 
fact is a consequence of the sublinearity of u: it will be proved later that u 
is strictly concave on [l,oo), from which it follows that u(c) > u(2c)/2. 

We now turn to the part of the theorem that concerns geodesies, and 
prove that for a random walk {Xt,t < cn/2) of time-duration cn/2 with 
c < 1, there is a geodesic between a = ^^id /, that we call 7, such that 

(2.1) E sup d(Xt,7) = 0(l). 

t<cri/2 

This shows that when c < 1 there is a geodesic that stays close to the random 
walk path. When o < 1 /4, p = I \s on the random walk path that leads from 
X to y, so this shows that E{d{I,j)) = 0(1), as claimed in the theorem. The 
case a > 1/4 is trivial by the triangle inequality. 

Let Ti,. . . be the sequence of transpositions that are the increments 
of the random walk path leading to a, so that a = ti . . .tn. Let 7 be the 
geodesic between a and / defined by 70 = cr, 71 = cttn, 72 = 7ir7v-i, . . . , 
until the first time t such that multiplying ■jt by T^-t would result in a 
coagulation of two cycles of 74. We do not allow this possibility (otherwise 7 
would not be a geodesic), and simply skip rjv_t: ^t+i = ItTN-t-i - We will see 
in a moment that this path never backtracks and that it ends at a bounded 
distance from /, to which it will be necessary to add a (bounded) number 
of steps so that it actually ends at /. 

Let n{t) be the index of the transposition to be performed at time t on 
7t. Note that we can always write 

lt = TlT2...Tn(t) n 

where Kt is a set whose size we will show is bounded. Indeed, even when we 
skip r„(() in 7^ [so that n{t) € Kt+i]-, the following transpositions Tn(t)-i-, ■ ■ ■ 
commute with the members of Kt with high probability and they can "jump 
above" the terms in Kt and cancel the rest of the transpositions (ti . . . T„(t)_i). 

Lemma 1. For all t, E{\Kt\) < 0{1), where 0(1) is a constant that 
depends only on c < 1. As a consequence, the path ends at bounded distance 
from the identity and the distance Esupt<cn/2diXt,^) =0(1). 

Proof. There are two ways to add a member to Kt-i at time t. The 
first one is that performing r^^t) will result in a coagulation, so that it is 
skipped by 7. The other way is if r^^t) does not commute with one of the 
members of Kt-i, it stays stuck somewhere in Kt- 

If T„(^) = {i,j), we claim that in order for i and j to be in the same cycle of 
7t, i and j must belong to a component of the Erdos-Renyi graph associated 
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with the random walk that contains a cycle at time cn/2. We will prove this 
in a moment, but if we admit this, then it follows that all transpositions 
in Kt act on vertices that belong to C/(cn/2), the unicyclic components of 
the random graph at time cn/2: if i G i^t, then either Tj = yields a 
coagulation in 74, or it does not commute with some member of (A:,/) of 
Kt-i, in which case overlaps with {k,l). By induction, k,l G U{cn/2), 
therefore so are i and j. 

Let us prove our claim that if would yield a coagulation in 74, then 
i,j G U{cn/2). Let us observe first that i and j must already be in the same 
component of the random graph: because t„(^) was performed on the random 
walk, i and j were connected at that point in the random graph and they 
remain so. If i and j are in different cycles of a, then there must have been 
some ulterior fragmentation in their cycles, so the claim holds. When they 
are in the same cycle of a, then there must be some transposition Tj with 
i £ Kt such that i and j are in different cycles of 7 after Tj . Call those cycles 
Ci and C2 . Ti involves two members k and / of Ci U C2 . Moreover the cycle 
structure of 7 before {k,l) is performed must be of the form 

otherwise {k, I) cannot separate i and j at the next step. Unless i and j 
belong to a complex component, this implies that the cycle structure of a 
has the same form. However, this can only happen if k and I were connected 
to the component of i and j at different times; otherwise the cycle structure 
would be of the form {i, . . . j, . . . ,k, . . . ,1) or {i, . . . k, . . . ,1, . . . This implies 
in turn the existence of a cycle in the random graph component of i and j 
at time cn/2. 

From there it follows in a straightforward way that l-ftTt] < \U{cn/2)\ (in a 
unicyclic component there are as many edges as vertices) . It is now standard 
in the theory of random graphs to show that \U{cn/2)\ is bounded: 

which completes the proof of the lemma. □ 

Now let Xt be a point on the random walk path. Since 7 tries to perform 
all Tj (at reverse), there is a time s such that n{s) =t, that is, the next 
transposition to be examined by 7^ is Tt. At this time, 

7s = Ti • • • Tt Yl n 

i£Ks 
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so that \Ks\ steps are enough to reach 7^ from Xt. Since E{\Ks\) < 0{1) by 
the lemma, we have proved that 

E sup d{Xt,-/)<0{l) 

t<cn/2 

and Theorem 1 is proved. □ 

3. Large deviations and volume growth. The goal of this section is to 
prove Theorem 2, which we restate here for convenience. 

Theorem 2. //O < a < 1, then as 00, we have \B{I,an)\ k, (n!)" in 
a logarithmic sense, that is, 

log|5(/,an)| 
lim = a. 

n— >cxD n log n 

Sketch of the proof. The proof of the result is more interesting than the 
limit. We begin by recalling the dynamics of the Chinese restaurant process 
(see, e.g., [9]). Customer 1 enters and sits at table 1. At step i, customer 
i enters and starts a new table with probability 1/f or sits to the left of 
customer k where k is chosen uniformly at random in {1, . . . From the 
tables we define a permutation a by a{i) = z if customer i is sitting by 
himself at his table and a{i) = k if k sits to the right of i. It is easy to see 
that this defines a uniform random permutation on Sn, and that the cycle 
structure is given by listing the individuals at the tables in clockwise order. 
It is well known that if cj G Sn, then d{a) = n— the number of cycles of a. In 
the Chinese restaurant process construction, let Ci be the random variables 
taking the value 1 if customer i sits at an existing table (and otherwise). 
The Ci's are independent Bernoulli(l — 1/i) random variables. Recall that 
if (7 is a permutation, then d{a,I) = d{a) is n — ^cycles of a. Hence, if a 
is uniformly distributed over Sn, then d{a) has the same distribution as 

The i's where a new cycle starts (i.e., d = 0) are distributed with the same 
law as that of the occurrences of records for i.i.d. variables with continuous 
distribution function (cf. [5], Example 6.2 of Chapter 1). From calculations 
in that example it follows that (re — 5n)/logn ^ 1 in probability. 

Returning to our calculation of the volume of the ball, 

\B{I,an)\=n\P{Sn<an) 

for all < a < 1. It is straightforward to generalize large deviation results 
for i.i.d. random variables (see, e.g., [5], Section 2.9) to prove Theorem 1. 
One begins with the observation that for A > 

(3.1) P{Sn < an) < e^'^'^Ee-^^" 
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optimizes the upper bound over A and uses a change of measure argument 
to prove a corresponding lower bound. 

Proof of Theorem 2. Let {Q, i> 1} be independent with P{(i = 
1) = 1 — and let 5.„ = J27=iCi- Since (log n!)/(n log n) — > 1 it suffices to 
show: 



Lemma 2. Let 0<a<l. As oo, 

log P[Sn< an] 



lim 



71 log n 



a-1. 



Proof. Let (/3„(A) = E[e' 

n 

'^n(A) = n 



Using the definition we have 



1 



1 



4 = 1 



where = indicates that the last equation is the definition of qi. By Markov's 
inequality we have 



(3.2) 

If we define 



log P[5'„ <an\<n[\a + - log (^„(A) 
' n 



for all A. 



Fx{x) 



V'n(A) 



-^^dF-(y), 



then Fx is a distribution function such that 



m.eax].{F\) 



and var(Fx) 



d <(A) 



To optimize (3.2), we want to choose A so that 



> 0. 



a+ -- 



0. 



nipn{X) 

This says that the mean of the transformed distributions is na, so 



l^(l-lA)e^ 



n " , 

1=1 



n 



E 



1 e- 



- (i-l)e~^ + l 



We guess that the optimal A must be given (asymptotically) by e 



~ Aon 



h/n. 



Plugging this in the above gives 

n-l 



1 



jh/n 



n ^ 



(jb/n) + 1 Jo 



^ bx ^ 1 , /, 

— — dx = l--log 6+1 . 

ox + 1 



From this we see that we should choose b so that log(6 + l)/6 = 1 — a. 
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Upper hound. Let us calculate what (3.2) gives with this choice of A: 

1 l^/'/'l\?7' 

(3.3) -\ogipn{\) = -logn + -logn 1--)^+- 
n n ^^^\\ I J I 

1=1 

(3.4) ^-logn+/ log(6+ l/j;)dx. 

JO 

Since the last integral is finite it follows from (3.2) and Aopt = — log 6 + log n 
that 

limsup — PiSn < no) < a — 1, 

n-^oo n log n 

proving the upper bound half of Lemma 2. 

Lower hound. The argument is similar to that in [5], page 73. Fix any 
V <a and v < u' < a. Define a real number b' by 

log(l + y) , , 

For any A, 

P[Sn <an]> / dP^'ix) > / e^-^ifniX) dFx{x) 

>MX)e^^''[Fx{na)-Fx{n,.)]. 

First, we prove that we can choose A such that [Fx{na) — Fx{ni^)] — > 1. Recall 
that the mean of Fx is — , and that the latter function starts at n — log n 
for A = 0, is strictly decreasing and equals na when A = Aopt = — log 6 + log n, 
that is, e~'*'°p' = b/n. Thus if we pick A = A' such that e"'*' = b' /n, the mean 
of Fx' is by the lower bound calculation exactly ni/', and we have chosen 
p < u' < a. To conclude that Px'{na) — Fxi{nv) 1, instead of using a law 
of large numbers argument such as in the i.i.d. case, we simply compute 
the variance of -Fx' directly. Anticipating on the calculations of the next 
section, breaking the factor e~'^^ in the Radon-Nikodym derivative of Fx 
into e~^S^* means that we can see -Fa as a sum of independent Bernoulli 
random variables with parameter Pi so that the variance is 

n n 

varFv = ^ A(l - ft) < E A = mean(FA) = nv' = 0{n). 

i=l i=l 

Another way to obtain this inequality is to do more direct computations: 
varFA = ^-('^V 

'/'n(A) \iPn{X)) 
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_ " e-^(l-l/i) ^ ^ e-^(l-lA)e-^(l-l/j) 



,1=1 



_" e"^(l-l/i) " e-2A(i_i/i)2 

Since the variance is 0{n), by Chebyshev's inequality we have that Fx{na) — 
F\{ni/) 1. Therefore, 

hmmi >u—\. 

n^oo nlogn 

But ly is arbitrarily close to a, so the result is proved. □ 

4. The uniform measure on dB{an). Let {Ci, 1 < i < n} have the distri- 
bution of {Ci, 1 < i < n} conditional on J2?=i d = L*^"-] ■ Let {Ci^\ 1 <i <n} 
be independent with distribution 

dFx,,{x) = -^e-^^dF,{x), 

where Fi , <j)i are respectively the distribution function and the Laplace trans- 
form of Ci, and A is the optimal parameter of the previous section, = b/n. 
It is easy to see that (^^^^ is another Bernoulli random variable with 

P[C.W = 1] = P[Ci = l]e"^-|- = I := A. 

(piiX) l + n/(6(?-l)) 

We are now ready to prove: 

Theorem 3. Let \Ci\ be the length of the cycle that contains 1. Under 
fj,, the uniform distribution on dB{I,an), 

\Ci\^G, 

where G is a geometric r.v. with P(G > k) = (6/(1 + 6))^ and b satisfies 
log(l + 6)/6 = l-a. 
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Sketch of the proof. The first part of demonstrating tliis is to recall what 
Arratia, Barbour and Tavare [1] call the Feller coupling. Start with vertex 1 
and choose (t(1) uniformly from the n possible choices. If this is 1, then take 
vertex 2 and choose (t(2) uniformly from the n — 1 remaining possible choices. 
If cr(l) 7^ 1, then choose a{a{l)) uniformly from the n—1 remaining choices, 
and so on, until the final vertex where there is only one possible choice. 
Although the construction is much different from the Chinese restaurant 
process, the reader should note that if is defined by = 1 if a cycle is 
not completed at the ith stage and otherwise, then : 1 < i < n} and 
{Ci-^^i^iT'} have the same distribution. 

Prom the last observation it follows that N = inf {i : = 1} has the same 
distribution as the length of the cycle containing 1. We can now conclude the 
proof of the theorem, using the large deviation calculation of the volume, and 
an argument called the Gibbs conditioning principle (see [4]). This principle 
asserts that the distribution of the Q conditional on J27=i Ci = '^^'^ should 
be asymptotically independent and their law given by that which minimizes 
the entropy, that is, the random variables Ci'*'^ with distribution 



where Fi, (pi are respectively the d.f. and the Laplace transform of and 
A is the parameter that optimizes (3.1), that is, e~'^ = b/n. 

Proof of Theorem 3. We wih first need a lemma. 

Lemma 3. For any n > 1 and for every A > 0, 



(4.1) 



1 



e-^^'dFiix) 



{([, . . . , C) = (Cf \ . . . , C^)) 9^ven ^ ci'^ = [an\ . 



Proof. Let fi, ■ ■ ■ ,fn be bounded nonnegative Borel functions: 




= /i(Ci)---/n(Cn);ECi=M r Ec. = M 





On the other hand. 
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i=l 



-A [an J 



g-A[anJ 



n 



i=l 



E 



/i(Ci)---/n(Cn);EC. = L 



an 



i=l 



We can now divide and multiply by the probability of the events in the 
two sides of this equation to obtain that for some constant C > 



^ n ( ^ ) ; I E Ci'^ = LanJ ]=CE[\{h{Ci)\Y.C^=[an\\. 

\i=l i=l ) \i=l i=\ ) 

By taking /i = ••• = /„ = 1 we see that C = 1 and the lemma is proved. □ 



We will need another lemma: 



Lemma 4. The satisfy a local central limit theorem: 



.i=l 



Proof. The proof of this local limit theorem follows very closely that 
of the usual i.i.d. case, which can be found in Theorem 5.2 of [5]. Let Prn = 
P(d^) = 1) [i.e., Pm = (1 + n/6(m - and let X„,„ = n~^/\C^^ - f3^m) 

be the rescaled Bernoulli variable. We start by noticing that Xm,n satisfy 
the hypotheses of the Lindeberg-Feller theorem (Theorem 4.5 in [5]). Indeed, 
they are independent by definition; for all e > 0, P{\Xm,n\ > e) = as soon 
as n~^/^ < e, since (^m^ < 1 and Pm < 1 as well. Moreover, 

n 1 " 



m=l 



n 



m=l 

1 x/b 



+ 



■ dx := a . 



Therefore X]m=i ^m,n =^ AA(0, a). At this point, the proof of the local limit 
theorem from [5] can be reproduced exactly. Therefore 



sup 



„l/2p J2 Xm,n=x] -n{x) 



^ m=l 
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where n{x) := (27rcT^)~^/^ exp(— x^/2(t^). Since J2m=i(^m. o-'^, and since 
n(-) is a continuous function, we can conclude the proof of the lemma by 
the above uniform convergence. □ 

Now, by the Feller coupling, |Ci| =inf{A: > 1 : Cn^k ~ ^ji that is, we must 
reverse the time of the Chinese restaurant process. Hence by Lemmas 3 and 4: 

p[Ci>k] = p[c; = i,...,cu+i = i] 



Sn 



(A) 



1, 



(A) 



' • • • ' ^n-k+1 



i|Ec 



(A) 



1=1 



k-l 
Li=0 



[anj 



n—k—l 



1; Yl Cf^ = La^J 



1 



(A) 



k-l 
[an\\ i=o 



(A) . 

n—i 
1 



1=1 

n— fc— 1 



1]P 



l+n/(6(n-l)) l+n/{b{n-k)) 



1 



Hence Theorem 2 is proved. □ 

5. Asymptotic singularity between /x and v. In this section we give a 
proof of Theorem 5 that follows in an almost straightforward way from 
Theorems 2 and 3: u and fi concentrate on permutations that have a different 
number of fixed points. First recall the statement of the theorem: 

Theorem 5. As oo, the hitting distribution v and the uniform dis- 
tribution fi on a sphere of radius an are asymptotically singular: 

dTY{fl,u) 1. 

Lemma 5. The random partition of {l,...,n} derived from v is ex- 
changeable. 

Proof. The probability to obtain a certain partition of {1, . . . , n} under 
v only depends on the size of its blocs, which stays the same under the 
action of a given permutation. Hence v yields an exchangeable partition of 
{l,...,n}. □ 

An immediate consequence is that the expected number of fixed points 
is niy{Ci = 1) = n/(l + 6). Next we show that under v the number of fixed 
points N is close to its expected value. 
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Lemma 6. 

var = o(n^) 

under v. 

Proof. Let Xi = '^sc>=n-c' =o\ be the indicator of the event that in the 
conditioned Chinese restaurant process, chent number i sits by himself. Then 

N = Xi and 

n 

var = ^ YaiXi + ^ ^ cov(xj, Xj) 
< n + ^ 51 cov{xi,Xj). 

i<j 

But when j — i > 1, by the Gibbs asymptotic independence proved in The- 
orem 3, cov{xi,Xj) — > 0. Also, there are only 0(n) terms such that j = i + l 
and in this case cov(xj,Xj+i) < 1, hence the sum X]i<j cov(3;j, = o(n^). 
□ 

To end the proof of Theorem 5 by Chebyshev's inequality there remains 
only to notice that: 

Lemma 7. For < a < 1 and large enough n 

K|Ci| = i)//i(|Ci| = i). 

Proof. Recall that h is defined by log(l + h)/h = \ - a. For x G (0, 1), 
let /(x) = 1 — log(l + x)/x, so that h = f~^{a). 

On the other hand, an easy consequence of Berestycki and Durrett [3] or 
Theorem is /i(|Ci| = 1) = e~" [Indeed, under fj,, \Ci\ is asymptotically 
the total progeny of a Poisson-Galton- Watson process, or PGW process 
with offspring mean u~^(a).] 

Hence the lemma is proved if we show that 

l/(l + &)/e""'(") or n(x)/l-x/(e^-l) 

for all X > 0. 

We start by noticing that as x ^ 0, u{x) ~ x but 1 — x/(e^' — 1) ~ x/2. 
Hence u{x) > 1 — x/(e^ — 1) as x — > 0. The same is true as x — > oo [an easy 
argument shows indeed that u{x) = 1 — + o(e~^)]. Now those functions 
are both concave as we will see in a moment, hence this has to stay true on 
the whole open half-line x > 0. (Notice that we have thus proved that the 
hitting distribution has always fewer fixed points than the uniform distribu- 
tion.) □ 
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Lemma 8. The function u appearing in Theorem is concave. 

Proof. For c < 1 this is obvious. When c > 1, rather than carrying ex- 
phcit calculations on the second derivative oiu, we use a theoretic argument 
that exploits the recent result of Schramm [12], which says that the sizes 
of the pieces of the giant component in the random graph have approxi- 
mately a Poisson-Dirichlet distribution. Since each fragmentation decreases 
the distance by 1 and each coalescence increases it by 1, it is easy to see 
that 

^Yj\d{acn/2J)\^cn/2] = 1 " 2P[fragm. | , 

where is the canonical filtration generated by the random walk. So we 
need to show that P[fragm.] is an asymptotically increasing function of c. 
However, the probability of fragmenting a small cycle is asymptotically 
(by duality and the fact that u is linear in the subcritical regime), and the 
probability of fragmenting one of the giant cycles can be computed explicitly 
using the Poisson-Dirichlet structure: 

oo oo 

P[fragm.] ^ E^(e(c)X,)2 = e{cf^Y.^'^ = ¥(^?^ 

i=l i=l 

where 9{c) is the survival probability of a PGW{c) branching process and 
{Xi,i> 1) follows the PD{1) distribution. (EX;^f = 1/2 follows from [10], 
formula (128).) Since 9{c) is an increasing function of c, the lemma is proved 
(and thus, so is Theorem 5). □ 

Remark. We have thus proved the formula 

j-c/2 

n(c) =c/2 - / e{ufdu 
Jo 

which is perhaps a little simpler to handle than the expression in Theorem 0. 

6. Number of geodesies and Radon Nikodym derivative. Here we prove 
the following theorem, which we will then use to prove a stronger version of 
the singularity theorem. 

Theorem 6. Suppose d{a) =t and mi, . . . ,mj are the cycle lengths of 
a . The number of paths of length t from I to cr is 

j mi—2 

•fj (m,-l)!- 
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From this it follows that if t= [cn/2] with c < 1, then 



r{a) = Kn,tt[ 



'mi— 2 



where K^.^t is a constant that only depends on n and t. 

Sketch of the proof. To see the first result, note that in order to go from a 
to / in the shortest number of steps we must increase the number of cycles 
by 1 at each step, and to do this we must fragment a cycle at each step 
by transposing two of its elements. A cycle of length rrii will require mi — 1 
fragmentations. The first step in constructing a path is to decide on how 
to allocate the t moves between the original cycles, which can be done in 
tl/Yll^iimi — 1)! ways. The next step is to count the number of ways that 
we can reduce a cycle of length nii in — 1 steps, which turns out to be 
simple: m^^~^. 

Proof of Theorem 6. Given a partition of {l,2,...,n} into groups 
Ai, . . . ,Aj of sizes mj, I <i < j, the number of forests that consist of trees 
with vertex sets Ai,. . . ,Aj is by Cayley's formula for the number of unrooted 
trees on rui vertices 

i=l 

Let t = ^i{mi — 1). A given forest can be built up in t! ways so there are 

i 

t\ W m-r 



i=l 



paths for our random graph process that end up producing a given partition. 
The number of permutations that correspond to a given partition is 

n(m,-l)!. 

An equal number of paths end at each permutation with cycle sizes m^, 
1 ^ ^ < J i so the number of paths to a given permutation is 



.in 



j mi— 2 



If t = [cn/2] with c < 1, then the number of edge choices that end up pro- 
ducing no fragmentations is by Theorem 1 in [3] 



t 



Taking the ratio of the last two results gives Theorem 6. □ 
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7. The size of the support of the hitting distribution. In this section we 
prove Theorem 7, restated below. 

Theorem 7. Suppose a < 1/2. There exists a set Sn G dB{an) such that 
^{Sn) — > 1 as n —> oo and 

lim - log , = 7 < 0. 

n^co n \on[an)\ 

Sketch of the proof. Obtaining a decay at least exponential is not very 
hard, even in the case a > 1/2. However, it is not easy to prove that this is 
the correct rate for the decay of ISI/IS-Bj, and we restrict ourselves to the 
case a < 1/2. 

If C7 G dB{an), then we can use Theorem 6 to find that 

n 

logi/o(o-) = -anlogn + an + ^ Ofc logpfc + o{n), 

k=l 

where pk is the Borel distribution with parameter c, and a^ is the number of 
cycles of a of size k. But by the law of large numbers, VQ{ak/n) should have a 
limit as n — > oo. Hence there is a set S such that (log i'o(<^) + an log n) /n has 
a limit — ci whenever o" € S. Because vq{S) ^ 1, |5| ^ exp(anlogn + cin). 
Moreover it is also true that i^{S) — > 1. On the other hand, precise estimates 
on the size of dB(an) obtained via Kolchin's representation theorem tell 
us that \dB{an)\ = exp(anlogn + C2n + o(n)). (A statement of Kolchin's 
theorem can be found below.) Thus, the theorem holds with 7 = ci — C2. To 
prove that 7 7^ 0, we argue that the decay has to be at least exponential (a 
consequence of Kolchin's representation theorem). 

Proof of Theorem 7. We will need precise estimates on the size 
of dB{an). Because we need estimates to order higher than just nlogn, 
sticking to the large deviation approach is not good enough. Rather, we will 
use Kolchin's representation theorem. We would like to thank Jim Pitman 
for pointing out this reference to us. 

Suppose we can partition {!,..., n} into a certain number of clusters, 
which can all have different internal states. To be more specific, suppose that 
each partition of {1, . . . ,n} into k clusters leads to V}^ possible global states 
of the system {1, . . . ,n}, and that we can further assign each cluster of size 
j one of Wj possible internal states. We call such a combinatorial structure 
a (i), tt;)-partition (of {l,...,n}). Kolchin's representation theorem answers 
with probabilistic means to the following purely combinatorial question: 
how many different (f , zz;)-partitions are there? Also, what does a random, 
uniform, {v, ti;)-partition look like? 
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Before going into the details of this theorem, let us see its relevance to our 
problem. The number of permutations at distance an from the identity is a 
special instance of the above Kolchin problem, where Vk = l{A:=(i-a)n} and 
Wj = {j — Indeed a permutation at distance an is exactly a permutation 
having (1 — a)n cycles and each cluster of size j can be in one of the {j — 1)1 
possible orderings of the cycle. 

Here is the content of Kolchin's theorem (see [10]). Let v{9) = J2k^i VkO^/kl 
and let w{S,) =YlJLi'Wj^^ / j\ be the so-called exponential generating func- 
tion of the sequences v and w. Let K be an integer- valued random variable 
with distribution 

P{K = k) = vu- 



k\v{w{^)) 

and let X be a random variable distributed according to 



Here ^ is any parameter. In our setting, K = {1 — a)n, a.s. and X has the 
so-called logarithmic distribution, P{X = j) = b> / j ■ _iog(x_fc) , for some pa- 
rameter b = w{S,). 

Theorem 9 (Kolchin). The number of {v,w) -partitions is given by 
nlv{w{S,)) 



where Xi are i.i.d. samples of the variable X. Moreover, the sizes of the 
clusters in exchangeable random order have the same law as 

{Xi , . . . , Xk) given Xi + • • • + Xk = n. 



For a precise definition of exchangeable random order, and further discus- 
sion of this theorem, see [10]. It is to be noted that here ^ is any parameter. 
By playing on this parameter so as to make the event Sk = n not unlikely 
(e.g., of probability oc n~^/^ rather than exponentially small), we get that the 
sizes of the clusters are approximately drawn from the r.v. X. Note that as a 
consequence we get here another proof of Theorem 3. Indeed, we see that the 
sizes of the cycles of a uniform permutation on dB in exchangeable random 
order have a logarithmic distribution (asymptotically when the parameter 
is chosen suitably). Hence, a size-biased pick |Ci| should have distribution 
P{X' = j) = const, j ■ Y (xV , a geometric random variable. The similarity 
between the large deviations-statistical mechanics approach and Kolchin's 
theorem is striking. 
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Another straightforward consequence of this theorem is the precise asymp- 
totics for the size of a ball of radius an. Indeed, in our setting, v{9) = 
6l(i-")"/[(l-a)n]! and = - log(l - 0> hence: 

where ^ is still any parameter. However, when ^ is chosen such that (1 — a) x 
E{X) = 1, the local central limit theorem shows that -P(X]iii"^" = n) ~ 
Cn~^/^. By Stirling's formula, it is now straightforward to see that 

\dB{an)\ = exp(anlogn + C2n + o(n)). 

Let us now turn our attention to the hitting distribution. We will get the 
corresponding estimate by analyzing the Radon-Nikodym derivative r{a) 
and the law of large numbers for v, as mentioned in the sketch of the proof. 

More precisely, it follows from the proof of Theorem 6 that if o" G dB{an), 
with cycle decomposition of size mi, . . . , m(i_a)„, and t = an, then 

n(l-a) mi-2 

1 . -t-r m- 



Let us write a^ for the number of cycles of a of size k, so that X]fc=i 
n{l — a) and X)fc=i ^^fc = iT'- We can rewrite the above as 



ce 



When we take the logarithm, calling g^, = ^^^t— (ce '^)'^ and = A;gfc, 



logz^o(o-) = k(c) -tlog^^) +logt! 



+ ^ afc logpfc + '^(l — a) logc — nlogc + cn. 
fc=i 

Recalling that t = an, c = 2a, and using Stirling's formula, we find that 

n 

(7.1) logz/o(cj) = -onlogn + an + ^ Ofclogpfc + o(n). 

fc=i 

We would now like to use the law of large numbers for since we know 
that under 



Qk 

n 



<e; Vl< A;<n^ ^1 
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for given e > 0, and where pk is the Borel distribution. But it is not directly 
possible to take 5 = {a G dB : |^ — < e; V 1 < < n} since we would 
obtain a bound eX^fc^i logp^ = oo. So we need to modify our choice: let 

< (logn)~^; VI < A: < (logn)^ and a/; = otherwise 

There are two things we need to check on Sn- First we need to see that 
it has the property that v{Sn) — > 1, and also that it has the correct size 
asymptotically. The first thing is taken care of by the next lemma, while the 
second will follow from the fact that i'o{Sn) — > 1, itself also a consequence of 
the lemma below. 



aedB: 



n 



■qk 



Lemma 9. u{Sn) 1. 



Proof. It suffices to prove that u{dB — Sn) 0. By the coupling with 
an Erdos-Renyi random graph, there is a /3 > such that no cycle can be 
greater than /31ogn with high probability under v. Hence it remains to prove 
that 

(logn.)2 

E A --Ik <(lognr)-0. 

k=l 



n 



The basic idea is to use random graph estimates. Let G{t) be the result 
of a random graph process where edges are added in a Poissonian way at 
rate (g). Then the expectation of the number of clusters of size k in 
G{an), is known to be nqk asymptotically with standard deviation 0{n^^'^), 
which is much smaller than the n(logn)~^ from the definition of Sn- So 
we need to show this still holds under v. Recall that we can couple the 
process {G{t),t > 0) with a random walk {at,t > 0) where we multiply by 
a transposition whenever edge arrives in G{t). Thus we may 

consider the first time T that a is at distance [an\ of the identity, and 
obviously a realization of ly is obtained as ctt- We consider also T' the first 
time that [an\ edges have been added to G{t). Then since T' is the first 
time a Poisson process with intensity 1 exceeds the value [anj , T' has mean 
[an\ and variance 0{n). Hence the number of edges added to G between 
[an] AT' and [an] V T' is 0(n^/^). On the other hand, only a bounded 
number Zan of edges are added between T' and T, corresponding to the 
number of fragmentations at time an, and thus all in all only 0{n^^'^) edges 
are added between an AT and an V T, so this may create or destroy at most 
0(?i^/^) clusters of size k for each 1 < A; < (logn)^, which is much smaller 
than the n(logn)~^ from the definition of Sn- Hence we conclude that 



(logn)2 

E ' 

k=l 



ak 

qk 

n 



< (logn) 



0. 
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□ 

Thus u{dB — Sn) 1- Since vq is obtained as an asymptotically nonde- 
generate conditioning of z^, it follows immediately that i^o{dB — 5„) ^ as 
well, that is, i'o(5'n) 1- We now show how to use this to estimate the size 
of Sn- By (7.1), for all a G Sn, 

log fo { cr )+ an log 77, ^ , 

^ ^ —>a+ 9fc+ log77 5 logpfc + o l 

(logn)2 

>a+ ^ gfelogpfc + o(l), 
fc=l 

from which we deduce that 

liminf — (log foicr) + ar7log?i) = a + 7 Qk logPfc := — Ci. 

n^oo ji ^ — ' 

fc=l 

After similar treatment for the limsup, we get 

1 



77 



-(logfo(o") + 077 log 77) — > -Cl. 



Since 



it must be that = exp(a?ilog77 + ci + o{n)). Therefore 

lim - log = Cl - C2 := 7. 
n->co n \c>B\ 

It now remains to show that 7 7^ 0. Observe that by Kolchin's theorem, 
we could pursue the asymptotic expansion of \dB{an)\ and the next term 
would be polynomial in 77. From the exact formula of i^{cr), we could also find 
the next term for IS*] and find that it is polynomial. Hence if 7 = 0, then 
we would have |5|/|(9i?| ~ 77~" for some a > 0. But, another consequence 
of Kolchin's theorem is that the decay has to be at least exponential: for 
instance, permutations in S have a number of fixed points characteristic of 
7^ and not of fi. As we have seen earlier the number of fixed points under 
/i, ?T,/(1 + b), is smaller than under the hitting distribution. But since the 
number of fixed points under i^t is given by a sum of almost independent 
random variables, 

n— [artj n— [anj 

Y ^{x^=l} given ^ Xi = n, 

1=1 i=l 
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we have that 

(n— [an] ^ n— [an] \ 

1=1 1=1 / 

by standard large deviations (here, simply Markov's inequality), and because 
the event on which we condition is of probability Cn~^/^. Hence the decay 
has to be at least exponential and 7 cannot be 0. □ 

Remark. The same argument shows that the hitting distribution of a 
is supported on a set at least exponentially small even in the case a > 1/2, 
but of course we do not know whether this is a precise asymptotics. If the 
decay is still exponential after a = 1/2, it seems likely that the exponential 
coefficient will not be smooth at a = 1/2. In Figure 2 we have plotted the 
value of this coefficient against a time-change of a. It would be interesting 
to compute exact asymptotics in the case a > 1/2 and make this picture 
complete. 

Remark. Kolchin's representation theorem could have been used al- 
ready earlier for the proofs of Theorems 2 and 3. This would actually sim- 
plify the proof of both results. However, we have chosen to keep the proofs 
as they were, because they do not rely on a technical result such as Kolchin's 
theorem, which is not as well known as standard large deviation theory. 

8. Asymptotic hyperbolicity under the uniform measure. Here we present 
a proof of Theorem 8. The sketch of the proof below contains some ideas 
that will be used and not re-explained in the actual proof that follows. 

Theorem 10. Let < a < 1 and let cr, vr be two random independent 
points chosen uniformly from dB[an). Then: 

1. //a< l-log2, 

E{a\-K)p<5{\ognf 

for some < (5 = 5{a) < cxd. Moreover, with probability asymptotically 1, 
there is a geodesic between a and vr that comes within distance at most 
6{logn)^ of p. 

2. 7/a> l-log2, 

-E(cr|7r)p ~ 5n 

for some 6 = 6{a) > 0. Moreover, no geodesic can approach p closer than 
6'n for some < 6' < 00. 




Sketch of the proof. To guess what the answer is, we exploit once again 
the connection with the theory of random graphs. The first thing to do is to 
realize that because of the symmetries of the Cayley graph Gn it is enough 
to look at d{I, a • vr) and see whether it is approximately 2an or much smaller 
than 2an. 
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To construct our graph, we will need some notation. Let 
(8.1) vr = rir2 • • • Tan 

be a minimal decomposition of vr as a product of transpositions, with the 
following convention. If we list all cycles of vr in the order of their least 
element, then the transpositions Tj are those {x,y) such that y comes just 
after x in the cyclic decomposition of tt and x and y are in the same cycle, and 
we order the an transpositions according to their position in this canonical 
decomposition. To clarify the ideas, suppose 

7r = (1 4 3 7)(2)(5 8)(6 10 9); 

then we write 

vr = (1 4)(4 3)(3 7)(5 8)(6 10)(10 9). 

We define the graph T = {V,E) on n(l — a) vertices as follows. Let 
V = {cycles of a}, and there is an edge between C and C if there is x £ C 
and y £ C such that (x, y) is one of the an transpositions in the minimal 
decomposition described above. Note that this graph could have self-loops 
and multi-edges. 

A notion that we will use on several occasions is that of being a terminal 
point. We say x £ {1, . . . ,n} is terminal if x does not appear more than once 
in the transpositions of the above minimal decomposition. This means that, 
with those conventions, x is situated at the "end" of the cycle of vr in which 
it is contained. 

Here is why we are interested in the properties of the graph F. If we 
define o"o = o" and, for 1 < r < an, ar = cr ■ ti- • - Tr, and consider the process 
(crjO < r < an), this is a walk on G„, starting at a and ending at cr • vr. 
Moreover, since at each step we are multiplying by a transposition, the 
cycles of ar evolve according to a discrete coagulation-fragmentation chain, 
with cycles merging when the transposition involves elements from different 
cycles, and cycles splitting otherwise (as it is the case for simple random walk 
on Gn)- Therefore, F is the graph that results from drawing an edge between 
two cycles of a as we encounter a transposition joining those two cycles. In 
particular, the same argument that shows that the Erdos-Renyi random 
graph is an upper bound for the sizes of the cycles of simple random walk 
on Gn, will show that the cycles of cr • vr are subcomponents of the connected 
components of F, with possibility of fragmentation whenever there is a cycle 
in F, or a self-loop or a multiple-edge. All other edges represent coalescence 
of cycles in the walk (cr,., < r < an). 

In particular, the property of F that we will be most interested in, will be 
to decide whether F has a giant component, meaning a component containing 
a positive fraction of all n{l — a) vertices. Indeed when all cycles of F are 
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small, we should expect very few cycles in T and hence little fragmentation. 
Hence most steps of the walk cr^ are coalescence events and the number of 
cycles decreases linearly; in other words, in the case that all cycles are small, 
vr) ~ 2an. On the other hand, if T contains a giant cycle, then we can 
expect many cycles in the graph and hence many fragmentation events in 
the walk ((7^,0 < r < an), which means that d{a,TT) <C 2an. 

Here is our strategy to see whether there is a giant component in T. Rather 
than counting the number of cycles of a that a component of T contains, 
we prefer to compute the exact number of integers in that it 

actually encloses prior to shrinking all cycles of a into points. Formally, this 
means, give weight VF(C) = |C| to any vertex C of F, and ask what is the 
total weight of a connected component of T. Let Ci(T) denote a size-biased 
pick from the connected components of F, that is, the total weight of the 
connected component of F containing "1" [or, more precisely, Ci{cr)]. 

Lemma 10 shows that W{Ci{T)) converges in distribution to the total 
progeny of a branching process with offspring distribution a shifted geomet- 
ric random variable. The idea is that by Theorem 3, the various cycles of a 
are asymptotically i.i.d., so that each edge in F adds to the weight of Ci(F) 
a contribution which is, by Theorem 3, asymptotically a geometric random 
variable G with parameter 1/(1 + 6) where b satisfies log(l + b)/b = 1 — a. 
This seems to give an infinite progeny almost surely (since G > 1 a.s.). How- 
ever, to every point that we examine there is a positive probability that it 
is a terminal point. In this case, that integer does not connect to a new 
independent cycle of a, and hence its offspring is 0. This kind of modified 
branching process is defined more precisely and analyzed in Section 8.3. The 
key fact is that because of the special properties of the asymptotic law of 
vr, which involves geometric random variables, this modified branching pro- 
cess is in fact equal in distribution to another branching process where the 
offspring distribution has been shifted from G to G — 1. In all that follows, 
we call X a random variable such that 

X = G-l. 

Hence F has a giant component if, and only if, E{G) > 2. Since p = 1/(1 + 6) 
and log(l + 6)/6 = 1 — a, 

P{T has a giant component) > <^=^ a > 1 — log 2. 
Proof of Theorem 8. 

8.1. Structure of the proof. As this proof is rather long, we feel that it is 
appropriate to explain how the various arguments are used. In Section 8.2, 
we prove that l^(Ci(F)) ^ J2t>o^t the total progeny of a branching pro- 
cess with offspring distribution X. Then in Section 8.3, we define a modified 
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branching process, and prove that in the case of geometric random variables 
this becomes another branching process with shifted offspring distribution. 
We then use this to prove by hand that in the subcritical case, \Ci{a • 7r)| is 
dominated by such a modified branching process. Since this is a subcritical 
branching process, we prove the exponential decay of the tail of |Ci(cj • vr)|, 
uniformly in n. This enables us to show that as long as o < 1 — log 2 there 
are very few fragmentation events in the walk {ar,r = 0, . . . , an). The super- 
critical case is treated in Section 8.5. Since we have established branching 
process asymptotics, we can use the duality principle of a branching process 
between the subcritical phase and the supercritical phase. This shows that 
the number of clusters of T in the supercritical regime can be computed by 
looking at the number of clusters of T for some specific subcritical time. 
Since we have proved that the distance is linear in this regime, we now 
know how many clusters T has at any subcritical time, and it follows that 
the number of clusters of T in the supercritical regime is strictly less than 
what it would be if the distance was still linear. It only remains to prove 
that at any given time the number of extra cycles that were generated by 
some fragmentation (and have not been reabsorbed by other large cycles) is 
0{n^^'^), which is done in Section 8.6. 

8.2. Branching process asymptotics. To start proving things, we need 
some more notation. Let Aq = {1} and define recursively the A'^ by 

^fc+i= U U ^1- 

zeA^ l<j<k 

The A^ correspond to growing the branching process generation after gen- 
eration, rather than cycle after cycle. Let {Zt,t = 0,1, .. .) be a branching 
process with offspring distributed as X. Note that by the construction of T, 
we also have that 

oo 

^|^^| = T^(Ci(r)). 

k=0 



Lemma 10. Asn^oo, 

i\A^\,\A^\,...)^{Zo,Zu...). 

Proof. Let us start by the convergence of (|^o|, |^i |)- If J = 0, P{\Ai 
1,1^5^1 =0) =P(7r(l) = 1) ^ 1/(1 + ^) ■.= p = p[x = 0). If j > 1, then 

P{\A^\ = 1, \A^,\=j) = P{n{l) + 1; |C,(i)(a)| = j) 

= P(vr(l)/l).P(|C,(i)(a)|=j|7r(l)^l) 

^{\-p).{\-py-\ = P{X = 3). 
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Indeed, conditionally on {'/r(l) 7^ 1}, vr(l) is uniform on {2, . . . ,n}, so that 
C^(i)(o") is as good a size-biased pick as Ci(cj), and we can apply Theorem 3. 

Now let us consider the general case of finite-dimensional distributions. 
Let Til > 0, n2 > 0, . . . , n/c > with J2i f^i ^ We are trying to compute the 
asymptotics of 

P{\A^^\ = l,\A^,\=n^,...,\Al\=nk). 

To do this, we need to evaluate the probability of a collision occurring in 
the first k stages, that is, 

P ( 'k{x) G y - C^{-k) for some x G with j < A; j . 

\ l<i<k ) 

We will say of an x such as in the event above, that it makes a backward 
connection. Hence an x makes a backward connection if 7r(x) maps it to 
some lower level in the branching process, but x is not a terminal point. 
Therefore backward connections (or collisions) are exactly those that may 
lead to a fragmentation, as explained in the sketch of the proof. 

It is easy to see that P(collision in first k stages) = 0(l/n). In fact, it 
follows from the uniformity of Lemma 11 that 

P{\Aq \ = no, . . . , \A^\ = n^; b.w. collisions in k first stages) < E 'b' — ^~ — " 

~ n 

(see also Lemma 12 where similar estimates are derived). 
Therefore it is enough to consider 

P(|^o| = 1, \A^\ = . . . , \A^\ = nk\ given no b.w. connection). 

Suppose A^_^ = {xi, . . . ■,Xnj._^}- Conditionally on the event that there is no 
collision in the k first stages, 7r(a;i), . . . , 7r(xnj._ J belong to yet unexplored 
cycles (as long as they are not terminal). After decomposition on the number 
of such X [call T(j4^_-^) the number of terminal elements in the set 
the last probability is equal to 

n.k-l-3 \ 

Ctt{x,){(^)\ =nk]P{T^{Al_i- j\ no b.w. conn.)) 
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by the asymptotic independence property of a finite number of size-biased 
cycles, and the fact that given there was no backward connection, the x in 
level ^fc_x belong to different cycles of vr, so that the events that they are 
terminal are independent asymptotically. 

These are the transition probabilities of a branching process with offspring 
distribution X, so the lemma is proved. □ 

8.3. A modified branching process. One way to formalize the idea that 
a vertex has a geometric number of children only during finitely many gen- 
erations, is to use a modified branching process where each individual x is 
endowed with a nonnegative, integer- valued random variable r(x), that rep- 
resents the "life-time" of its family. As long as T{x) > 0, x will keep having 
children according to the original offspring progeny L. But when T{x) = 0, 
the individual will be declared ^^terminaF and will not be allowed to have 
any children. 

Here is a rigorous description of this modified branching process. Let 
Xt^i be a collection of i.i.d. random variables with distribution L, a fixed 
distribution on the nonnegative integers (the original progeny). Let Tt,i be 
i.i.d. nonnegative integer-valued random variables, distributed according to 
another distribution L' , the lifetime. Let Zt be the size of the process at 
time t (with discrete time). Define Zq = 1, and give the root lifetime Tq^o- 
Then define recursively Zt by 

Zt 

(8-2) Zt+i =^Xt,il[T(xi)>o}, 

i=0 

where xi, . . . ,xzt are the Zt individuals of generation t. If yi, . . . ,yzt+i are 
the Zt+i individuals of generation t + 1, the rule that we adopt for the value 
of T(yi), . . . ,T{yzt^i) is the following. If T{xi) > 0, give all Xt^i children 
of Xi independent lifetimes from Tt^i, except for one of its children, say yj, 
for which T{yj) := T{xi) — 1. Rigorously, let Nt = #{i : T{xi) > 0}, rewrite 
the Xi's removing the terminal ones and call them x'^, . . . , x'j^^. Let T{yi) = 
T{x[) - l,...,T{yNt) = T{x'^J - 1, and let T{yNt+i) = Tt+i,Nt+i, ■ ■ ■ , 

T{yzt+i) = Tt+i,Zt+i- 

Of course we make this definition because asymptotically, I^(Ci(r)) will 
be well approximated by such a system, where the offspring L is the size 
of a cycle of a, and where L' is the size of a cycle of vr. Indeed, suppose 
we are exploring the cluster containing 1 in the graph of the superposition 
of a and vr. T(l) is |Ci(-7r)|, which corresponds to the fact that after that 
many iterations of vr we are back to where we started and no longer add 
anything new to the population of the cluster. However, after one iteration 
say, all vertices in the first generation, other than 7r(l) itself, belong to 
different cycles of vr with high probability. Therefore their lifetime should be 
an independent random variable, distributed as L' . 
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In general, the ageing branching process {Zt,t = 0, 1, . . .), where each in- 
dividual has a "lifetime" that it transmits to one of its children, is not a 
Markov process with respect to its own filtration a{ZQ,Zi, . . .). Indeed the 
size of the generation t+1 depends not only on the size of generation t, 
but also on the random variables where x is an individual of generation 
t, so one would need to add in the filtration the values of T{x) for each 
generation. 

However, a miracle happens due to the fact that the cycles of vr have 
(asymptotically) a geometric distribution G. Let p' be the parameter of G: 
P{G = j) = (1 — p'y~^p' . Then the distribution L' of the random variables 
Tf^i is again G. For fc > 1, conditionally on T > A;, T — fc is distributed as G. 
This fact, called "lack of memory," has the following amazing consequence: 

Proposition 1. When the lifetime L' is a geometric random variable, 
{Zt,t = 0, . . .) is a Markovian branching process with offspring distribution 

-^l{L'>o}- WhenL = L' = G, this distribution is X = G — 1 . 

Proof. Let Bt^i be Bernoulli random variables with success parameter 
P{Bt^i = 1) = p'. Because G =d inf{t > 0; Bt^i = 1}, the event {T{xi) > 0} is 
the same as {Bt^i =0}, so (8.2) becomes 

Zt 

(8.3) Zt+i = ^X^,jl{B^^=o}• 

This expresses the fact that for each new vertex visited, we can take the 
decision of closing the cycle, independently of the past. When the cycle 
still has some length to be explored, then the vertex has Xt^i children. This 
decision affects the law of progeny at a given vertex. The new distribution 
of the progeny is now, by (8.3): 

(«-^> ^(^=^-'={a-p')m,,=i). if';?: 

Of course, for our problem, a =d vr, so both L and L' are distributed as G. 
As can be readily checked from (8.4), the distribution of X is thus a shift 
of G: 

(8.5) X = G-l. □ 

8.4. Fragmentations in the subcritical case. First note that if cr is a uni- 
form permutation on dB{an), if we visit all points in according 
to their order of appearance in the canonical cyclic decomposition of cr, and 
call this process Vf (0 < t < n) , then the successive points are in some sense 
uniformly chosen from what remains to be found, at least as long as we do 
not have to start a new cycle. More precisely: 
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Lemma 11. Given (Vq, . . . , Vj) and given that Vt is not a terminal point, 
cr(Vj) is uniform on {1, . . . , n} — {Vq, . . . , Vt}. 

The proof of this lemma follows directly from the Feller coupling pre- 
sentation of a uniform permutation on which also has (obviously) this 
property. Conditioning on the number of cycles does not change how the 
cycles are filled in. 

Lemma 12. Suppose the branching process is subcritical, that is, p> 1/2 
or (equivalently) a < 1 — log 2. Then the number of fragmentations in the 
walk {ar,r = 0, . . . ,n) is o{n). 

Basically, all cycles are fairly small, so by improving our estimates on the 
number of collisions, we should get an 0(1) bound, just like in the Erdos- 
Renyi case. Technicality arises due to the fact that the cycles are conditioned 
independent random variables, and not just independent. Here is a rigorous 
proof. 

Proof of Lemma 12. We prove things in two steps. 
First, we prove a uniform bound for the size of a cluster: we show that if 
we write vr = 0^=1 '^i-> denote vr^ := 111=1 '^ii 

(8.6) P{\Ci{a ■ TTr)! > u for some r < an) < Cnexp{—au), 

where C and a are constants independent of n, and u is any number. 

Once this exponential control is proved, we can bound the number of 
times that one of the Tj-'s will yield a fragmentation. Indeed, recall that to 
obtain o" • vr we can perform successively the r^'s on a, and each one yields 
a coagulation or a fragmentation. We hence view this as a process indexed 
by 1 < r < an. In the course of this process, at all times, by (8.6) applied to 
u = (logn)^, no cluster is larger than (logn)^ with overwhelming probability, 
so that by Lemma 11: 

P(rr-+i yields a fragmentation) < 2(log n)^/n. 

There are (exactly) an transpositions to perform, hence: 

E{#frag.) < 2a(logn)2. 

This is already largely enough to prove Lemma 12. 

We will now prove that (8.6) holds, since this is the only thing that 
remains to be proved. Although we have seen that in the limit each cluster 
is a subcritical branching process (for which such an exponential tail of the 
total progeny holds), when n is finite there is no real branching process 
available to dominate Ci{a ■ vr,.), essentially because the sizes of the cycles 
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are not i.i.d. random variables. However, they are conditionally independent 
(cf. Theorem 9, or Theorem 3), and we will use this fact to construct a real 
branching process that dominates Ci{a ■ TTr), when conditioned on some mild 
event. This conditioning accounts for the extra factor n in (8.6). 

Here is how we proceed. By Kolchin's representation theorem (Theorem 
9), there are random variables {Xi, . . . ,^n(i_a)) such that the joint law of 
the sizes of the cycles of a is {Xi, . . . , X^^i-a)) given J2i Xi = n (we will call 
An the event that J2i = The Xi's constitute a "pool" of possible cycle 
sizes. Similarly, there are random variables {Yi, . . . ,Yn(i-a)) such that the 
joint law of the sizes of the cycles of vr is (li, . . . ,^(i_a)) given J2i^i = ^ 
(let Bn be the event that J2i^i — 

We give an upper bound of Ci {a ■ vr) in terms of the modified branching 
processes of Section 8.3, that uses only the Xj's and the Yi's. Start with 
vertex 1 and choose a size-biased pick X[ of the Xj's (the cycle containing 
1). Put T(l) = Y(, a size-biased pick of the l^'s. Next, given X[ = k, put 
r(2) = Y2, . . . , T{k) = Y^. All vertices with positive lifetime T have a number 
of children given by a size-biased pick of the remaining Xj's. They transmit 
their lifetime — 1 to one of their children and the rest have lifetimes given by 
size-biased picks from the remaining l^'s. Then repeat the procedure until 
we cannot go any further (i.e., until all vertices at a given generation have 
lifetime T = 0, or until all Xj's and 1^'s have been picked). Call Z' the total 
population obtained at the end of this construction. 

We claim that Z' dominates all stages of Ci{a ■ vr^), because Z' gives the 
cycles of a coagulated by those of vr, without taking any account of eventual 
fragmentations. In particular, in Z' , as long as a vertex x is not terminal 
{T{x) > 0), the children of x will be part of the population of Z' . Of course 
in the event of a collision or a backward connection, Z does not contain any 
additional children, so that Z < Z' . Therefore 



To complete the proof, it remains to notice that size-biasing the logarith- 
mic distribution of Kolchin's theorem gives a geometric random variable. 
Therefore, by arguments already developed in the sketch of the proof, Z' 
is the total population of a branching process with offspring distribution X 
of (8.5), and starting with a geometric number of individuals G. Because 



P{Z >u)< P{Z' > u\An and Bn) 

< P{An)-^P{Bn)-^P{Z' > U- An;Bn) 

<CnP{Z' >u). 
Indeed, by the local central limit theorem (see [5]), 

/n.(l-a) \ 
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p> 1/2, this branching process is subcritical. In this case, classical estimates 
[2, 5] show the exponential tail 

P{Z' >u)< Cexp(-an). 

This concludes the proof of (8.6), and also that of Lemma 12. □ 

Remark. It is possible to avoid the use of Kolchin's representation the- 
orem in the above proof. Indeed, by Theorem 3, a size-biased pick of the 
cycles has, after unconditioning on some event of probability oc a dis- 

tribution which is given by the lengths of sequences of 1 in the Bernoulli 
trials c| . However, since Pi = P((}^) = 1) < b/{l + b), it follows that the dis- 
tribution of a size-biased cycle is thus (after unconditioning) stochastically 
dominated by the geometric random variable G. 

8.5. Mean in the supercritical regime and duality. Although this may 
seem a little surprising at first, we use the result from the subcritical case to 
get that for the supercritical case. The idea is to use the duality of branching 
processes. 

A crucial remark is that the number of cycles of cr • vr is given by J2x=i 1/ 
\Cx{(7 • vr)|, hence by exchangeability 



^-£;(#clusters of T) = ^ ( ttttf^) -^e( ^ 



T>1 



n ' \\Cx[V)\) \T-\ 

where T is the total progeny of a branching process with offspring distributed 
as X and started with one individual. Indeed, let us not forget that the 
first generation Aq of the branching process is itself a geometric random 
variable, so we can add an imaginary root and then subtract it (thus T — 1). 
Introducing an extra vertex for the root allows us to make use of the duality 
principle of branching processes [2, 6]. 

The duality principle states that a supercritical branching process, con- 
ditioned on extinction, is another branching process, subcritical, whose off- 
spring distribution is given through its generating function. If (/>(s) = E{s'^) 
is the generating function of X and a is the extinction probability a = P{T < 
oo) , then the conditioned process has offspring distribution characterized by 

(j)' {s) = (j){sa)/a. 

Here, P{X = j) = (1 — p)-^p, so 

^(^)=1-K1-P)- 

Therefore 
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The fixed point equation for a yields that 

(8.7) ^ + a(l-p) = l 

a 

so that (j)' is the Laplace transform of another shifted geometric random 
variable X' , with parameter p' =p/a. Let T' be the total progeny of a 
branching process with offspring X' and started with one individual. 
Let us now relate the supercritical and subcritical regimes. By duality, 



E 



T-1 



,_ p(r<oo) / 1 

P{T<ocl^f 1 



T<oo 



p(r> 1) ^\T-i 
p(r<oo)^, . 1 



-^(r'>i)^(^73Y|^'>i) 



P{T > 1) 

However, for the subcritical regime, we know by Lemma 12 that there are 
only o{n) fragmentations, so the distance between a and vr is 2an + o{n) and 
the number of clusters is (1 — 2a)n + o{n). Hence E{ rp,\-^ \T' > 1) = 1 — 2a', 
where a' is the radius corresponding to the conditioned parameter p' . Since 
p = 1/(1 + 5) and log(l + b)/b = 1 — a, we have that 

plogp 

a=l + . 

1 — p 

On the other hand, due to the fixed point equation (8.7), the constant P{T' > 
1)/P{T > 1) = (1 -p')/(l -P) simplifies into a. 

Therefore, Theorem 8 is proved when we show that 

0^(1 - 2a') > 1 - 2a. 

Using the fixed point equation (8.7), we find that a' = 1 + plogp' / {{a^){l — 
p)), so that the above reduces to 

2 ^P^ogp' „plogp 

—a —2 >— 1 — 2 

1 — p 1 —p 

or 

2ploga 2 1 

> a — 1. 

1 — p 

Using one more time the fixed point equation, one gets 

2ploga > a — 1. 
Since log(l — x) > —x, it is therefore enough to show 

-2p{l - a) > a - 1 or 2p < 1 
which is precisely the condition that the branching process is supercritical. 
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8.6. Fragmentations in the supercritical range. In the previous section 
we have computed asymptotics for the expected number of clusters in the 
graph resulting from the superposition of the cycle structures of a and vr. We 
now need to show that at the end of the walk (o"r, r = 0, . . . , an), there are no 
more than o{n) additional cycles that have been generated by fragmentation, 
compared to the number of clusters of T. 

To do this, we use once again the dynamic point of view adopted to deal 
with the subcritical regime. Let Ti,...,Tan be the decomposition of vr in 
product of an transpositions as evoked earlier, and let = a ■ ti . . .Tr- 

Lemma 13. For each 1 < r < an, the expected number of cycles in ar 
generated by fragmentation is 0(n^/^). 

This is similar to the Erdos-Renyi case of Berestycki and Durrett [3] , The- 
orem 3. Lemma 13 does not claim that the number of fragmentations itself 
is 0(n^/^), but that the number of extra cycles generated by fragmentation 
is 0(n^/^). Just like in the Erdos-Renyi case, many of the cycles that are 
fragmented get reabsorbed by large components fairly quickly. 

Proof of Lemma 13. There can never be more than n^/^ cycles of 
size larger than n^/^. On the other hand, by Lemma 11, the probability 
that Tt will create a fragment of size smaller than n^/^ is at most n)-!'^ jn = 
Therefore the expected number of such fragmentations is at most 
an-n'^l^ = 0{n^/^). □ 

At this point. Theorem 8 is proved. □ 
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